TREC 2002 Cross-lingual Retrieval at BBN
نویسندگان
چکیده
Two sets of parameters are important in the retrieval model. One is the translation probabilities P(tq|td). In TREC 2001, we used model 1 of Brown’s statistical MT work (Brown et al, 1993) for estimating term translation probabilities from a parallel corpus due to efficiency considerations. With more computer power at disposal, for TREC 2002 we used the more complex but potentially more accurate model 4 for the same purpose. Differences between the two models were discussed by Brown et al, 1993.
منابع مشابه
TREC 2001 Cross-lingual Retrieval at BBN
BBN only participated in the cross-lingual track in TREC 2001. Arabic, the language of the TREC 2001 corpus, presents a number of challenges to both monolingual and crosslingual IR. First, many inflected Arabic words can correspond to multiple uninflected words, requiring context to disambiguate them. Second, orthographic variations are prevalent; certain glyphs are sometimes written as differe...
متن کاملEnglish-Chinese Cross-Lingual Retrieval Using a Translation Package
Using a COTS English-Chinese bidirectional translation software package together with our PIRCS bilingual retrieval system, we performed English-Chinese cross-lingual retrieval experiments using the TREC Chinese collections and queries. With some simple approaches, we are able to attain effectiveness about 67% of the monolingual Chinese results.
متن کاملCross-lingual Information Retrieval Using Hidden Markov Models
This paper presents empirical results in cross-lingual information retrieval using English queries to access Chinese documents (TREC-5 and TREC-6) and Spanish documents (TREC-4). Since our interest is in languages where resources may be minimal, we use an integrated probabilistic model that requires only a bilingual dictionary as a resource. We explore how a combined probability model of term t...
متن کاملIIT at TREC-10
For TREC-10, we participated in the adhoc and manual web tracks and in both the site-finding and cross-lingual tracks. For the adhoc track, we did extensive calibrations and learned that combining similarity measures yields little improvement. This year, we focused on a single highperformance similarity measure. For site finding, we implemented several algorithms that did well on the data provi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002